A Memory-based Robust region feature synthesizer for zero-shot object detection
Abstract
With the goal to detect both the object categories appearing in the training phase and those never have been observed before testing, zero-shot object detection (ZSD) becomes a challenging yet anticipated task in the community. Current approaches tackle this problem by drawing on the feature synthesis techniques used in the zero-shot image classification (ZSC) task without delving into the inherent problems of ZSD.
In this paper, we analyze the outstanding challenges that ZSD presents compared with ZSC—severe intra-class variation, complex category co-occurrence, open test scenario, and reveal their interference to the region feature synthesis process.
Methodology
In view of this, we propose a novel memory-based robust region feature synthesizer (M-RRFS) for ZSD, which is equipped with the following mechanisms:
1. Intra-class Semantic Diverging (IntraSD): To overcome the inadequate intra-class diversity problem.
2. Inter-class Structure Preserving (InterSP): To address the insufficient inter-class separability issue.
3. Cross-Domain Contrast Enhancing (CrossCE): To solve the weak inter-domain contrast problems.
Moreover, when designing the whole learning framework, we develop an asynchronous memory container (AMC) to explore the cross-domain relationship between the seen class domain and unseen class domain to reduce the overlap between the distributions of them. Based on AMC, a memory-assisted ZSD inference process is also proposed to further boost the prediction accuracy.
Results
To evaluate the proposed approach, comprehensive experiments on MS-COCO, PASCAL VOC, ILSVRC and DIOR datasets are conducted, and superior performances have been achieved.
Notably, we achieve new state-of-the-art performances on MS-COCO dataset:
• 64.0% Recall@100 with IoU = 0.4
• 60.9% Recall@100 with IoU = 0.5
• 55.5% Recall@100 with IoU = 0.6
• 15.1% mAP with IoU = 0.5
Under the 48/17 category split setting. Meanwhile, experiments on the DIOR dataset actually build the earliest benchmark for evaluating zero-shot object detection performance on remote sensing images.